CAUSAL INTERPRETATION OF STOCHASTIC DIFFERENTIAL 

EQUATIONS 

ALEXANDER SOKOL AND NIELS RICHARD HANSEN 
O 

Abstract. We give a causal interpretation of stochastic differential equa- 
tions (SDEs) by defining the postintervention SDE resulting from an inter- 
vention in an SDE. We show that under Lipschitz conditions, the postinter- 
vention SDE is equal to a uniform limit in probability of postintervention 

£/*) ' structural equation models based on the Euler scheme of the original SDE, 

thus relating our definition to mainstream causal concepts. We prove that 
when the driving noise in the SDE is a Levy process, the postintervention 

0^ distribution is identifiable from the observational distribution. Also for the 

P^ ' case of Levy driving noise, we relate our results to the notion of weak con- 

ditional local independence (WCLI) by proving that if a coordinate X' is 

>~H , locally unaffected by an intervention in another coordinate X^ , then X 1 is 

WCLI of Xi . 



> 

f*"*- ■ 1. Introduction 

(N 

The notion of causality has long been of interest to both statisticians and scientists 
t^J- \ working in fields applying statistics. In general, causal models are models contain- 

ing families of possible distributions of the variables observed as well as appropriate 
mathematical descriptions of causal structures in the data. Thus, claiming that 
a causal model is true amounts to claiming more than statements about the dis- 
tribution of the variables observed. Causal modeling has several goals, prominent 
among them are: 



'i 



(1) Estimation of intervention effects from partially observed systems with a 
given causal structure. 

(2) Identification of the causal structure from observational data. 

One of the most developed theories of causal inference is the DAG-based approach 
for finitely many variables with no explicit time component, descibed in }32j and 
[22] . In recent years, there have been efforts to develop notions of causality for 



2010 Mathematics Subject Classification. Primary 60H10; Secondary 62A01. 
Key words and phrases. Stochastic differential equation, Causality, Structural equation 
model, Identifiability, Levy process, Weak conditional local independence. 

1 



2 A. SOKOL AND N. R. HANSEN 

stochastic processes, both in discrete time and in continuous time. For discrete- 
time results, see for example [9], [10], [IT] and [12] . As discrete time models 
often are defined through explicit functional relationships between variables, as, 
for example, autoregressive processes, such models fit directly into the DAG-based 
framework. In the continuous time framework, the uncountably infinite number 
of variables complicates the question of how to describe causal relationships. 

Early discussions of causality in a continuous-time framework can be found in 15 , 
[13] and [5] . One of the most recent frameworks for causality in continuous-time is 
based on the concept of weak conditional local independence. For results related 
to this, see [7], [4], [14], [30] and [31]. An alternative notion of causality defined 
solely through nitrations is developed in [25] and [24] . 

In Section 4.1 of [JJ it is noted that both ordinary differential equations and sto- 
chastic differential equations (SDEs) allow for a natural interpretation in terms of 
"influence" , and that interventions may be defined by substitutions in the differen- 
tial equations. In this paper, we make these ideas precise. Our main contributions 
are: 

(1) For a given SDE, we give a precise definition of the postintervention SDE 
resulting from an intervention. 

(2) We show that under certain regularity assumptions, the postintervention 
SDE is the limit of a sequence of interventions in structural equation mod- 
els based on the Euler scheme of the SDE. 

(3) We prove that for SDEs with a Levy process as the driving semimartingale, 
the postintervention distribution is identifiable from the observational dis- 
tribution. 

(4) We relate our results to weak conditional local independence (WCLI) by 
showing that for SDEs with a Levy process as the driving semimartingale, 
X 1 is WCLI of X3 if X % is locally unaffected by an intervention in A J . 

In matters of causality, it is important to distinguish clearly between definitions, 
theorems and interpretations. Our definition of interventions in SDEs will be a 
purely mathematical construct. It will, however, have a natural causal intrepreta- 
tion. Given an SDE model, in order to use the definition of intervention given here 
to predict the effects of real-world interventions, it is necessary that the SDE can 
be sensibly interpreted as a data-generating mechanism with certain properties: 
Specifically, as we will argue in Section [4] it is necessary that the driving semi- 
martingales are autonomous in the sense that they may be assumed to be locally 
unaffected by interventions. This is an assumption, which is not testable from a 
statistical viewpoint. It is, nonetheless, an assumption which must be justified by 
other means in concrete cases. 

The remainder of the paper is organized as follows. In Section [2] we motivate 
and introduce our notion of intervention for SDEs. In Section [3] we review the 



CAUSAL INTERPRETATION OF SDES 3 

terminology of causal inference as developed in [22] and [32] , based on structural 
equation models and directed acyclic graphs. Section |4] shows that under certain 
conditions, our notion of intervention is equivalent to taking a limit of interventions 
in the context of structural equation models based on the Euler scheme of the SDE. 
In Section [5] we give conditions for postintervention distributions to be identifiable 
from the observational distribution. Section|6]relates our work to weak conditional 
local independence. Finally, in Section we discuss our results. 



2. Interventions for stochastic differential equations 

Consider a filtered probability space (il, J 7 , {J-t)t>o,P) satisfying the usual condi- 
tions, see [26 for the definition of this and other notions related to continuous- 
time stochastic processes. In this section we introduce a causal interpretation of 
stochatic differential equations. In general, the precise meaning of "causation" is 
a point of contemporary debate, see for example [6]. For our purposes, it suffices 
to take a practical standpoint: The causal structure of a system is sufficiently elu- 
cidated if we know the effects of making interventions in the system. To motivate 
our definition, we begin by investigating a simple example. 

Example 2.1. Chemical kinetics is concerned with the dynamic evolution of the 
concentrations of chemicals given in terms of a number of coupled chemical reac- 
tions [341 . The example considers two chemicals and we derive a simple system of 
SDEs from the fundamental mechanisms of chemical reactions. If the concentra- 
tion of one chemical is fixed - as an alternative to letting it vary according to the 
chemical reactions - the fundamental mechanisms allow us to obtain an SDE for 
the concentration of the remaining chemicals. This equation can be obtained from 
the original system by a purely mechanistic deletion and substitution process. 

The chemicals are denoted x and y and the corresponding concentrations are 
denoted X and Y, respectively. There are four reactions 

-*-+ y 



y ► v 

Here, the first reaction denotes the creation or influx of chemical y with constant 
rate a, the second reaction denotes the change of y into x at rate b^Y, and the 
third and fourth reactions denote degradation or outflux of x and y with rates 
bnX and bi^Y , respectively. We collect the rates into the vector 

( ° \ 

b 12 Y 

bnX 

V h2Y J 



(2.1) A(X,Y) = 



4 A. SOKOL AND N. R. HANSEN 

The so-called stoichiometric matrix 

<-) »-(! -\ "o 1 -°i 

collects the information about the number of molecules, for each of the two chemi- 
cals (rows) , that are created or destructed by each of the four reactions (columns) . 
There are several different stochastic and deterministic models available. One pos- 
sibility is a Markov jump process on Ng of the total number of molecules of each 
chemical with the above mentioned transition rates and transitions given in terms 
of S. A corresponding (linear) system of ODEs for the concentrations is 

(2.3) l(*)- S A(X„y«>-(°) +B (*« 

with 

(2.4) B -- 



-&n b 12 



— b\2 —i>i2 

A system of SDEs approximating the Markov jump process, see [2J, is given by 



(2.5) 



Xt \ I Xq 



J b(* s ) ds + J E(X S ,Y S ) dW s 



Y t J \Y a + at 
where W s denotes a four-dimensional Wiener process and 



X(X,Y) = Sdmgy/\(X,Y) 

VbuY -y/b^X 



(2 ' 6) V V* ~Vb~uY -Vh2Y 

If we are able to fix the concentration Y t at a level c, we effectively remove the 
first and last of the reactions and the second will have the constant rate 612c. By 
arguments as above we derive the SDE 

(2.7) X t = X + tb 12 c - J b xl X s ds+ f a(X s ) dW s 



with W s a two-dimensional Wiener process and a{x) — {-\/bx2C, —\fbnx). We 
observe that this SDE can be obtained from (J2.5I) by deleting the equation for Y t 
and substituting Yt with c in the remaining equation. 

It should be noted that the SDEs in this example do not satisfy the usual Lipschitz 
conditions due to the square root. To avoid technical issues we can cap all entries 
in X(X, Y) at a sufficiently small lower level c and a sufficiently large upper level 
C with < c < C < 00. The resulting SDE will then have bounded Lipschitz 
coefficients. 



Example 12.11 illustrates how a model for the intervention in a system can be ob- 
tained from a model for the entire system. In this particular example, the resulting 
model can be justified by reference to the fundamental mechanisms - the chemical 
reactions - that drive the system, and interventions result in SDEs modified by 
substitution and deletion. While noting that such an argument may not always be 



CAUSAL INTERPRETATION OF SDES 



justified, we will use this principle as a general, purely probabilistic definition of 
interventions in SDEs. Note also that in the example above the diffusion matrix 

b 12 Y + b 11 X ~b 12 Y 



(2.8) £(X,Y)£(X,Y)<= ( _^ y a + bi2Y + h2Y 

is not diagonal, implying that the martingale parts of the semimartingale (X, V) 
are not orthogonal. This shows that there are naturally occuring situations where 
it is necessary to consider models with non-orthogonal martingale parts - a situ- 
ation excluded in the WCLI framework of [14] . 

In order to formalize our definition in a general framework, let Z be a d-dimensional 
semimartingale and assume that a : W — > M(p, d) is a continuous mapping, where 
M(p, d) denotes the space of real p x d matrices. We consider the stochastic 
differential equation 



(2.9) XI = xl 



V f aij (X 3 -)dZi, i< P . 



Definition 2.2. Consider some m < p and c e R. The stochastic differential 
equation arising from (|2.9| under the intervention X m := c is 

(2.10) Yl = 4 + V / bijfXs^dZi, i < p, i ^ m and Y t m = c, 

3=1 J ° 

where bij(yi, . . . , y p ) = o»j(yi, . . . , c, . . . , y p ), and the c is on the m'th coordinate. 

By Definition 12. 2[ intervening takes an SDE as its argument and yields another 
SDE. Note that existence and uniqueness of solutions are not required for Defini- 
tion [272] to make sense, although we will mainly take interest in cases where both 
(|2.9|) and (|2.10| have unique solutions. By Theorem V.7 of [26], this is the case 
whenever the mapping a is Lipschitz. 

Assume that (|2.9|) and (|2.10j) have unique solutions for all interventions. We refer 
to (|2.9p as the observational SDE, to the solution of (|2.9[) as the observational 
process, and to the distribution of the solution of (|2.9[) as the observational distri- 
bution. We refer to (|2.10[) as the postintervention SDE, to the solution of (|2.10[) 
as the postintervention process and to the distribution of the solution to (|2.10|) as 
the postintervention distribution. 



3. Terminology of SEMs, DAGs and interventions 



In this section, we review the basic notions related to intervention calculus for 
structural equation models. For a detailed overview, see [22] or [32]. We will use 
these notions in Section [4] to interpret our definition of intervention for SDEs in 
terms of intervention calculus for structural equation models. 



6 A. SOKOL AND N. R. HANSEN 

Let V be a finite set, and let E be a subset of V x V. A directed graph G on 
V is a pair (V,-E). We refer to V as the vertex set, and refer to E as the edge 
set. A path is an unbroken series of vertices and edges such that no vertices 
are repeated except possibly the initial and terminal vertices. A cycle is a path 
with the same initial and terminal vertices. We say that G is an acyclic directed 
graph (DAG) if G contains no cycles. For any graph G and i £ V, we write 
pa(i) = {j E V | (j,i) € E}, and refer to pa(i) as the parents of the vertex i. If 
we wish to emphazise the graph G, we also write pa G («). 

A structural equation model (SEM) consists of three components: 

(1) Two families (Xi)i^y and {Ui)i^y of random variables. 

(2) A directed acyclic graph G on V . 

(3) A set of functional relationships Xi — fi(X paa ^, Ui). 

We refer to (Xi)i & v as the primary variables and (Ui)i & v as the noise variables. 
The idea behind a SEM is that the DAG provides the sequence in which the 
functional relationships are evaluated, thus yielding an algorithm for obtaining 
the values of (Xi) ie v from (C/j)i £ y. A SEM does not only yield the distribution 
of the variables (Xi)i^vt but also a description of the data-generating mechanism. 
This is made precise by the notion of an intervention, see also Definition 3.2.1 of 
1221. 



Definition 3.1. Consider a SEM with primary variables (Xj)j e y, noise variables 
(Ui)i£V: DAG G and functional relationships Xi — fi(X pa u\,Ui). Let A be a 
subset of V. The postintervention SEM doing Xj := Xj for j e A is the SEM 
with primary variables (X^i^v, noise variables (Ui)i & v, DAG G' obtained by 
removing all edges with terminal vertices j & A from G and functional relationships 
obtained by substituting Xj for Xj in all functional relationships with j '^ A as 
well as exchanging all equations corresponding to indicies j € A with the simple 
equations Xj — Xj. 



The idea behind Definition ^. H is that if the algorithm implicit in a SEM represents 
the data-generating mechanism for (Xj)^ e \/, then an intervention in the system 
resulting in fixing Xj at the value Xj for j g A would yield a data-generating 
mechanism corresponding to substituting the value Xj in all functional relation- 
ships involving Xj for j € A. 



4. Interpretation of continuous-time interventions 



In this section, we show that under Lipschitz conditions on the coefficients in (|2.9[) , 
the solution to the postintervention SDE described in Definition ^. 2l is the limit of 
a sequence of postintervention SEMs based on the Euler scheme of (|2.9|) . We use 
this to clarify the role of the driving semimartingales Z 1 , . . . , Z d . 



CAUSAL INTERPRETATION OF SDES 



Definition 4.1. The signature of the SDE ([22]) is the graph S = (V,E) with 
vertex set {1, . . . ,n} and no edge from i to j if and only if it holds for all k that 
ajk does not depend on the i'th coordinate. 



From an intuitive viewpoint, the signature S defined in Definition 14.11 describes 
which coordinates of the SDE (I2.9[) are causally dependent on each other in an 
infinitesimal sense: There is an edge from i to j if and only if X 1 has an infinitesimal 
causal effect on A J . If there is no edge from i to j we will say that X^ is locally 
unaffected by X % . The signature is used in the following definition to define an SEM 
corresponding to the Euler scheme for (|2.9j) . With a slight abuse of notation we 
choose in Definition 14 . 2 1 for convenience to consider the initial variables X\ , . . . , X^ 
as primary variables instead of noise variables. This is not a problem as it is obvious 
how interventions for the SEM given in Definition 14 . 2 1 should be understood. 



Definition 4.2. Fix T > and consider A > such that T/A is a natural 
number. Let N = T/A and t k = kA. The Euler SEM over [0, T] with step size A 
for (|2.9|) consists of the following: 

(1) The primary variables are the p(N + 1) variables in the set (Xf^)o<k<N- 

(2) The noise variables are the pN variables {Z 3 tk — ■^t fc _ 1 )i<fc<jv • 

(3) The DAG is the graph G = (V, E) with vertex set {1, . . . ,N} X {1, . . . ,p} 
defined by having ((ii,ji), (*2, J2)) be an edge of D if and only if 12 = i\ + l 
and (ji , j2 ) is an edge in the signature of (|2.9j) . 

(4) The functional relationships are given by: 

(4-i) rt = (JfX.+E««(^X-0 

A visual interpretation of the SEM of Definition 14.21 is shown in Figure 14.11 The 
figure shows how the signature S determines the DAG describing the algorithm 
for calculating the variables in the Euler SEMs. Making the intervention Xh '■= c 
for all k corresponds to removing all edges of the DAG in Figure |4~T1 with terminal 
vertex in the top row. 

The following two theorems yield our main results for this section. 

Theorem 4.3. Fix T > and let (A„)„>i be a sequence of positive numbers 
converging to zero such that T/A n is natural for all n > 1. For each n, there 
exists a pathwisely unique solution to the equation 

(4-2) {X n )\ =xl+J2[ <*« (*£(.-)) dZf, i < P , 

3=1 Jo 
where r) n (t) = kA n for kA„ < t < (k + 1)A„, satisfying that ((A") t J < fc < T / An 
is the primary variables in the Euler SEM for \2. 9\) . and sup 0<t< ' r \X t — A™| 
converges in probability to zero. 



A. SOKOL AND N. R. HANSEN 



c- 



•3 




Za — Z 



Z2A — Za 



Z3A — Z2A 



FIGURE 4.1. The signature for a three-dimensional SDE (left) 
and the DAG for the corresponding Euler SEM (right). 



Proof. By inspection, (|4.2|) has a unique solution, and ((X n )t h )k<T/A„ is the pri- 
mary variables in the Euler SEM for (|2.9|) . That sup 0<4<T \X t — Xf\ converges in 
probability to zero is the corollary to Theorem V.16 of [26]. D 

Theorem 4.4. Fix T > and consider A > such that T/A is a natural number. 
Fix m < p, c G R. The Euler SEM for the stochastic differential equation 112. 1 0\) is 
equal to the postintervention SEM obtain by making the intervention (X^) m := c 
forO<k< T/A in the Euler SEM for f£fy . 



Proof. The functional relationships in the Euler SEM for (|2.9j) are 

(4-3) (X A )l k = (Xtj +EM^K - Z L^ 

while for (J2.10I) and i ^ m, they are 

(4-4) (Y A )l k = (Ytj + E bMtJiZl - 4-J. 

where bij(y) = ay (2/1, • • ■ , C, . . . ,y P ). By inspection, (|4.4p is the result of substi- 
tuting c for (A^J in (g31). The result follows. □ 



Together, Theorem 14.31 and Theorem 14.41 states that the diagram in Figure 
commutes: Defining interventions directly in terms of changing the terms in the 
stochastic differential equation has the same effect as intervening in the Euler SEM 
and taking the limit. 



These results clarify what Definition 12.21 means: We consider the semimartingale 
Z as "autonomous" and assume that interventions do not influence this semi- 
martingale. Concluding this section, we give two examples to illutrate the nature 



CAUSAL INTERPRETATION OF SDES 



Eulcr SEM for observational SDE 



Postintervention Euler SEM 



Observational SDE 



Postintervention SDE 



Figure 4.2. The interpretation of intervention in a stochastic 
differential equation understood as the limit of interventions in 
the Euler SEMs. 



of interventions. In Example 14.51 we calculate the postintervention SDE for an 
intervention in an Ornstein-Uhlenbeck SDE, and in Example 14.61 we illustrate the 
necessity of a sharp division between autonomous and non-autonomous interpre- 
tations of processes. 

Example 4.5. Let x Q E W, A E W, B E M(p,p) and a E M(p,d). The Ornstein- 
Uhlenbeck SDE with initial value Xq, mean reversion level A, mean reversion speed 
B, diffusion matrix a and d-dimensional driving noise is 

(4.5) 



X t = X + f B(X S -A)ds + aW t . 
Jo 



where W is a <i-dimensional (Tt) Brownian motion, see Section 11.72 of [27] . Fix 
m < p and c E R. The SDE resulting from making the intervention X m := c is 



(4.6) 



y; = xi 



p 



Bij(Y a j -Aj) + B im (c - Am) ds + 



TijWj, 



for i 7^ m. Now let B be the submatrix of B obtained by removing the m'th row 
and column of B, and assume that B is invertible. With Y~ m denoting the p — 1 
dimensional process obtained by removing the m'th coordinate from Y, we then 
obtain 

ft 



(4.7) 



Yr 



Y 



B{Y- 



A)ds + aw t . 



where Yq is obtained by removing the m'th coordinate from X$, & is obtained by 
removing the m'th row of a and A = a — B^ 1 /3, where a and j3 are obtained 
by removing the m'th coordinate from A and the vector whose z'th component 
is bi m (c — a m ), respectively. Thus, Y~ m solves an Ornstein-Uhlenbeck SDE with 
initial value Yq, mean reversion level A, mean reversion speed B and diffusion 
matrix a. o 



The next example shows that an SDE may not be amenable to a causal interpre- 
tation. 

Example 4.6. Let X 1 = W be a one-dimensional Wiener process, let / : R — > R 
be twice continuously differentiable and let X 2 = f(X l ). If this relation really 



10 A. SOKOL AND N. R. HANSEN 

constitutes the causal relation between X 1 and X 2 the result of the atomic inter- 
vention X 1 := c is that X 2 = /(c). 

However, from Ito's lemma 

(4-8) X? = f{Xl)+ l - Jj'\Xl)d[Xl]+ Jj'{X])dX] 

= /(0) + +~ f f"(Xl)ds + y" f'(Xl)dW s . 

If we use Definition 12.21 the resulting postintervention SDE for X 2 under the in- 
tervention X 1 := c becomes 

X 2 = f(0) + f'(c)t + ^r(c)W t . 

The problem is that by substitution W for X 1 in the SDE that we derived from 
Ito's lemma, the resulting SDE looses its causal interpretation. The driving W 
process is not autonomous, and the postintervention SDE does not give the desired 
result. 

We should note that it is not the use of Ito's lemma in itself that is the problem, 
it is the subsequent substitution of X 1 by W. In fact, if we intervene directly in 
()4.8p by replacing X 1 by the constant c the result would be that X 2 = /(c). We 
could thus say that ()4.8|) retains the causal interpretation. However, Definition ^. 21 
does not allow for such interventions on the integrators. To do so generally would 
complicate matters considerably, and we will not pursue this any further. 



5. Identifiability of postintervention distributions 



In this section, we prove a result giving conditions for the postintervention distri- 
butions to be uniquely determined by the observational distribution. Our objective 
is to show that this uniqueness holds when the driving semimartingale for the SDE 
is a Levy process. 

Our methods will make use of the theory of Markov processes and their generators. 
We begin by reviewing some basic concepts. Recall from Chapter 4 of [8] that a 
family of transition probabilities on W is a family of probability measures Pt(x, •) 
for t > and x £ M. p such that (t, x) H> Pt(x,B) is measurable for all Borel 
measurable B, Po(x,-) is the Dirac measure in x and for all t,s > it holds 
that P t + s (x,B) — J RP P s (y,B)P t (x, dy). Given a cadlag stochastic process X 
with values in R p , we say that X is an (Ft) Markov process if there is a family 
Pt(x, •) of transition probabilities on M. p such that for s, t > and B £ B p , it holds 
that P(X t + s £ P>\Ft) = Ps(Xt,B) almost surely. If this holds with the filtration 
induced by the process itself, we simply say that X is a Markov process. 



CAUSAL INTERPRETATION OF SDES 11 

Let b(R p ) denote the space of bounded Borel measurable functions from R p to R. 
For a family of transition probabilities Pt(x, •), we define Pt : b(R p ) — > b(R p ) by 
Ptf(x) = J f(y)P t (x, dy). The mapping P t is then a linear operator on b(R p ). 
Furthermore, Pq is the identity operator, ||P t | < 1 for all t > where || ■ || denotes 
the operator norm induced by the uniform norm on b(R p ), and it holds that 
Pt+ a = PtP s for t, s > 0, meaning that (Pt) is a contraction semigroup. 

Next, let Co(R p ) denote the set of continuous mappings from R p to R vanishing 
at infinity, se Chapter 5 of [21]. Also, let C C (R P ) denote the set of continuous 
mappings from R p to R with compact support, and let C^(R P ) denote the subset 
of C C (R P ) which are twice continuously differentiable. We say that the semigroup 
(P t ) is Feller if P t maps Co(R p ) into itself and t *-} P t is continuous on Co(R p ) 
in the uniform norm. In this case, we let T>(A) be the set of / £ Co(R p ) where 
lim^o* - 1 (Ptf — Pof) exists as a limit in Co(R p ), and when it exists, we let Af 
denote the limit. We refer to T>(A) as the domain of A. By Corollary 1.1.6 of [5], 
A is then a densely defined and closed linear operator on Co(R p ). Finally, if X is a 
cadlag Markov process with a Feller semigruop, we say that X is a Feller process. 

To prove our results, we will need the following two technical lemmas. 

Lemma 5.1. Assume that X and Y are two Feller processes. If the domains of 
both generators contain C% (R p ) and the generators agree on this set, and the initial 
distributions of X and Y are equal, then X and Y have the same distribution. 



Proof. Let (P t ) and (Qt) be the transition semigroups of X and Y, respectively, 
restricted to Co(R p ). By our assumptions, both semigroups are then strongly 
continuous contraction semigroups. Applying Proposition 1.2.7 of [8], we obtain 
that P t f = Qtf for all / £ C^(W). As two probability measures on R p are equal 
if their integrals of elements in C^(R P ) are equal, it follows that X and Y have 
the same transition probabilities, yielding by Theorem 4.1.1 of {Si that X and Y 
have the same distribution. □ 

Lemma 5.2. Fix x £ W and let D be a bounded neighborhood of zero in R p . 
Let a, a £ W and b,b £ M.(p,p), and let v and v be two measures on R p such 
that x i— > min{l, ||a:|| 2 } is integrable with respect to v and v. Consider two linear 
functionals A and A from C^(M. P ) to R, where A is given by 

i— 1 i—l 3=1 J 

+ ff(x+y)- m - i D (y) J2 |^)y* d ^)' 

i— 1 

and A is given by the same expression, with a, b and v substituted for a, b and v. 
It then holds that A = A if and only if a = a, b = b and v = v on W \ {0}. 



12 A. SOKOL AND N. R. HANSEN 

Proof. It is immediate that if a = a, b — b and v = v on W \ {0}, then A = A. We 
need to prove the converse. Thus, assume that A = A. Fix a neighborhood B of 
x in M p . Assume that 5 contains the open ball in the Euclidean metric centered 
at x with radius S > 0. Using approximate units such as defined in |16) . we may 
for < 7 < 1 construct a family of mappings (/ 7 ) C C^°(MP) with the following 
properties: / 7 is bounded by 1, / 7 converges uniformly to Is as 7 tends to zero, 
and for 7 < 70j where 70 is some positive number, / 7 is constant and equal to one 
on the open ball in the Euclidean metric centered at x with radius 5(1 — 7). For 
7 < min{7o, 1/2}, we then obtain 



A !i I f~/( x + y) -f~(( x ) d v(y) = / fii x + y) - i<My) 

l(||v|| 2 >*/2)(/r(s + J/) - 1) <M2/)> 

and similarly, Af 1 = / l(|| y || 2 >5/2)(/ 7 ( a; + 2/) ~ l) d %)- As 1 4 {l,||a;|| 2 } is 
integrable with respect to v and v, both these measures are bounded on the set 
{y £ R p I ||y||2 > <5/2}. Therefore, we may apply the dominated convergence 
theorem and obtain 

limA/y = lim l( M2 >s/ 2 )(f 7 (x + y)-l)di>(y) 

7-HJ 7-tUJ 



I (lli/Ila>*/2)( 1 -B(a5 + y) - 1) dz/(y) 
1b (* + y) - 1 di^(y) = - / l B c(a; + y) di^(y), 
and similarly, lim 7 _j.o -^/y = ~ J 1b c ( x + J/) d£(y). We thus obtain 



lB"{x + y)dv(y) = - lim Af~ = - lim A/ 7 = / l B c(a; + j/) dP(y). 
7— >o 7— >o _y 

As B was an arbitrary neighborhood of x, we conclude that v and iJ agree on all 
closed subsets of W not containing zero. Therefore, v = £> on R p \ {0}. This 
implies that for all / € C^(R P ), we have 



Fix i < p. Again applying the approximation results of Chapter 2 of [16] . there 
exists / £ Cc(M. p ) such that f(y) = ?/j in a neighborhood of x, implying at — en = 0. 
As i was arbitrary, we obtain a = a. This implies that for all / £ C^(K P ), we have 

1 P P a2 f 



1=1 ]=1 



Fixing z, j < p, by Chapter 2 of [TB], there exists a function / £ C^(W) such that 
/(y) = ViVj m a neighborhood of x, implying bij — bij = 0. This completes the 
proof. □ 



CAUSAL INTERPRETATION OF SDES 13 

Note that in the proof of Lemma 15. 2\ if we had let A and A be functional given 
by the same types of expressions, but differing neighborhoods D and D in the 
integral, we would still be able to conclude that the measures v and v were the 
same, but we would be unable to subtract the integrals and obtain that a and a 
were the same as well. 

Next, we argue that the solutions to SDEs with Levy processes as driving semi- 
martingales and bounded Lipschitz coefficients are Feller processes, and we identify 
the generator. To this end, recall that a Levy measure on M d is a measure assigning 
zero measure to {0} and having the property that x i-> min{l, ||a;|| 2 } is integrable. 
Further recall by Theorem 1.2.14 of [3] that for any bounded neighborhood D of 
zero in M d and any <i-dimensional Levy process X, there is (a, C, v) with a £ M. d , 
C a positive semidefmite d x d matrix and v a Levy measure, such that 

(5.1) Ee mXt = exp [ iu*b - -u*Cu - [ e" 1 '* - 1 - iu t l D (x) &v{x)\ , 

uniquely determines the distribution of X. We refer to (a,C,v) as the charac- 
teristics of X with respect to D, or as the ^-characteristics of X. Note that in 
the statement of Lemma I5.2[ the measures v and v are not required to be Levy 
measures, as we do not require that the measures assign measure zero to {0}. 
This will be important, as we in proof of Theorem 15.41 will use the lemma for 
linear transformations of Levy measures. Such measures retain their integrability 
properties, but may assign non-zero measure to {0} if the linear transformation is 
non-injective. 

Theorem 5.3. Let D be a bounded neighborhood ofM. d , and let E be a bounded 
neighborhood of MP. Consider the SDE 

(5.2) Xl = x' +yl aij (X s _ ) AZi , i<p, 



.7 = 1 J ° 



where Z is a d- dimensional Levy process with D- characteristic triplet (a,C,u), 
and a : MP — > M(p, d) is Lipschitz and bounded. The solution of \5.2)) is a Feller 
process. Furthermore, the domain of the generator for the process includes C^(M P ), 
and for f £ C%(W), it holds that 



Af{x)=J2fr{x)^{x) + \J2J2Wx)Ca{x)% 



d 2 f 



'^TWT2^^ W ^ W ,l] dx~dx~ {x) 



i=\ j=i 



(5.3) 



+ / f{x + a(x)y) - f{x) -1eJ2 ^ x)Vi dTx{ " ){y) > 

i—1 

where T x : W 1 — > M. d is defined by T x (y) = a{x)y 7 and 

d „ d 

Pi{%) = Yl aij(x)atj + / {l T -i {E) (y) - l D (y)) Y^ a v( x )Vi dz/ (y)- 
i=i J 3=1 



14 A. SOKOL AND N. R. HANSEN 

Proof. Applying Theorem 2.4.16 of [3], we have 

Z t = at + BW t + / l [Q . t]xD (s,x)dM(ds, dx) 
(5.4) + / l [0 , t]xD a(s,x)dN(ds, dx), 



where C = BB l for some B £ M.(d, d), W is a <i-dimensional Brownian motion, N 
is a Poisson random measure on R + x (R d \ {0}) with intensity measure m+ ® i/, 
independent of W, and M is iV minus its compensator. Here, m + denotes the 
Lebesgue measure on R + . We may then rewrite the SDE (|5.2I) as 



d ,t 



X t =xo+J2 [ b(X 8 _)ds + f cr(X s _)dW s 
j- =1 Ja Jo 

+ J l [0 ,t]».D(s,x)F(X s -,y)dM(ds, dy) 
(5.5) + [ l mxD a(s,x)F(X s _,y)dN(ds, dy) 



where b(x) — a(x)a, cr(x) — a(x)B and F(x, y) — a(x)y. Thus, the SDE is of the 
type given as (6.12) in [3 J. By Theorem 6.4.5 of [3], X is therefore a Markov pro- 
cess, and by Theorem 6.7.4 of [3], its transition semigroup is Feller. Furthermore, 
by that same theorem, the domain of the generator A includes C^(M. P ), and for 
f£ C*(W), it holds that 

Am^b^^+lJ^j^HxMxn.^-ix) 

i—l ' i—l j — 1 •* 

+ J f(x + F(x,y)) - f(x) -J2F t (x,y)^-(x)dv(y) 

(5.6) + I f(x + F(x,y))-f(x)dv(y). 

Jd<- 

Substituting our expressions for b, a and F, we obtain 



P d df I P P d 2 f 

Af{x) = J^ Yl ^oi^aj—ix) + ^Y1 H( a ( x ) ga ( x )%' d dx , ( x ) 

i=l j=l i—l j — 1 

f p df d 

(5.7) + / f(x + a(x)y)- f(x)-l D (y)22—(x)22aij(x)yjdv(y) 



CAUSAL INTERPRETATION OF SDES 



15 



Now note that by continuity of T x , T x 1 (E) is a neighborhood of zero 
T~ 1 (E) may be unbounded, but as we have 



O-T-Vs)^) - 1 d{v)) 



i=i l j=i 



dv{y) 



1 t- 1 (e)\d(v) 



zZt^W 1 '^ 



(5.8) 



- 1 



D\T- 1 (E)(y) 



i=l 
P 



Av(y) 



Of 



J2d^.^J2 ai ^ x ^ 

i—l % 7—1 



<My), 



where both the final integrals are finite due to the integrability properties of v, we 
find that the first integral in the above also is finite, and we then obtain 

f P df d 

/ f(x + a{x)y) - f(x) - l D (y) ^ q^( x ) XI a ii^ x )Vi My) 

i=l l j=l 



i.=i 



(5.9) 



yielding the result. 



D 



Theorem 15.31 characterizes the distribution of the solution of (|5.2j) . In order to 
prove identifiability, it suffices to show that the postintervention SDE is of the 
same type as (J5.2I) and relate the parameters of the generator for the solution of the 
postintervention SDE to those of the generator for the solution of the observational 
SDE. This is done in the following theorem. 



Theorem 5.4. Consider the SDEs 

d ,., 



(5.10) X\ = xl 



and 
(5.11) 



y? = yo + 



J2 [ av(X,)dZi, 
J2 [ ~aij(Ys)dZi, 

.7 = 1 J ° 



i <P, 



i <P, 



where Z is a d- dimensional Levy process, Z is a d-dimensional Levy process and 
the mappings a : MP — > M.(p, d) and a : M p — > M(p, d) are Lipschitz and bounded. 
Let X and Y be the unique solutions. Lf X and Y have the same distributions, then 
the postintervention distributions of doing X m :— c in A5.10\) and doing Y m := c 
in i5.11]) are equal for all m and c. 



16 A. SOKOL AND N. R. HANSEN 

Proof. It suffices to show that the postintervention distributions for the nonintcr- 
vened coordinates are the same. Fix a bounded neighborhood D of zero in M d , a 
bounded neighborhood D of zero in M d and a bounded neighborhood E of zero 
in MP. Assume that Z has D characteristics (a,C,v) and that Z has D charac- 
teristics (a, C, v). By Theorem 15.31 X and Y are both Feller processes. As they 
have the same distribution, we obtain that both the initial distributions and the 
generators are the same for the two processes. It is then immediate that the initial 
distributions for the postintervention distributions are equal as well. 

For x £ MP, define T* : M d -> W by T°(j/) = a(x)y and Tj? : M d -> M p by 
T£(y) = a(x)y. Also define 



(5.12) Pi(x) =^2a ij (x)a j + / (l (T a)-i (B) (y) - l D (y))^2a ij (x)y 3 dv(y) 

3=1 3=1 

d - d 

(5.13) 0i(x) =^2& ij {x)& J + (l (T ^-i {E) (y) - l £) {y))^2a ij (x)y 3 di>(y). 

A — 1 ** A — 1 



3 = 1 3=1 



Applying the form of the generator given in Theorem l5.3l and the uniqueness result 
of Lemma T5. 21 we find that for all x £ M p and i < p, we have 

(5.14) fox) = Pi{x), 

(5.15) a(x)Ca(xY =a(x)Ca{x) t , 

(5.16) T»(i/) = I^(P). 

Next, we find that the postintervention SDEs for the nonintervened coordinates are 

XI = 4 + EU So b » (*«) dZ i and Y t = Vo + Ej =1 /o ^ (^) d ^ for * < P where 
b : MP- 1 -> M(p- 1, d) and 6 : M^ 1 -» M(p- 1, J) are obtained as b{x) = a(£ m (a;)) 
and 6(x) = a(£ m (x)) with £ TO : IR^" 1 — ► M p being the mapping inserting x on the 
m'th coordinate. By Theorem 15.31 the distribution of the first process, excluding 
the intervened coordinate, has a generator B which on C^(IR P_1 ) is equal to 

Bf(x) = X>(*)|£(*) + l -J2J2(b(x)Cb(x)% J ^I-(x) 

i—1 ' 2—1 j — 1 * ^ 

(5-17) + J f(x + y)-f(x)-l E J2^ : (x)y i dT^)(y), 

2—1 

and the distribution of the second process, again excluding the intervened coordi- 
nate, has a generator B which on C^(R P_1 ) is equal to 

Bm = 22ft X )*L( X) + lj2J2(Mx)cb(xy) ij ^-(x) 

i—l ' i—l j — 1 

(5-18) + J f(x + y) f{x) J2 !£(*)«< d3*(^)(»). 

i — l 



CAUSAL INTERPRETATION OF SDES 17 

where 

Cl r, d 

(5.19) 7,-(jc) =53 & «( a: ) Q; J + ( 1 (Ti)-HE)(y) - ^D(y))^2h j (x)y ] dv(y), 

3=1 3=1 

d „ d 

(5.20) 7i(x) =^6y-(a;)a j + (l {T i rl{E) {y)-l f) {y))^2b tj (x)y :j di){y). 

3=1 3=1 

Next, noting that 

(5-21) 7i(*) = &(&»(*)), 

(5.22) 6(z)C&(z)' - a(£ m (z))Ca(£ m (x)) 4 , 

(5-23) T» = T e m(x) ( I /), 

and similarly for the other parameters, we may apply (|5.14p . (I5.15[) and (|5.16p as 
well as Lemma T5. 2 1 to obtain that B and B agree on C^(MP~ l ), and thus Lemma 
15 .11 yields that the postintervention distributions are equal. □ 



In words, Theorem 15.41 states that for SDE models with a Levy process as the 
driving semimartingale, postintervention distributions are identifiable from the 
observational distribution. Note that the requirement that a and a be bounded in 
Theorem 15.41 is only used to ensure the Feller property. 



Theorem 15.41 allows us, in the case of Levy noise, to lift the definition of inter- 
ventions from a framework of SDEs to a framework of Markov processes: As 
all interventions made in SDEs with the same distribution will yield the same 
postintervention distribution, we can construct the quotient mapping of taking 
interventions relative to the corresponding Markov processes in the following way. 
For simplicity, consider the case without jumps and assume that we are given a 
Markov process whose generator restricted to C^(M. P ) can be written as 

(5.24) ^) = t^WE^W°i + 5EB W Co W < )«^:W 

i— 1 3=1 2—1 3 — 1 

for some aef and positive semidefinite C € M(p,p). This is a generator of the 
form (15.3[) . The postintervention distribution of this process by doing X m := c 
is obtained by letting the m'th coordinate be constant and letting the remaining 
coordinates follow the distribution of a Markov process with the same initial dis- 
tribution as the original distribution and generator whose restriction to C^(M. P ~ 1 ) 
is 

(5.25) Bf( X ) = £ |wE^(^)^ + fED i w«w t ).^w, 

i— 1 3 — 1 i—1 j — 1 •* 

and 6, as in Definition ^. 21 is defined by by(|/i, ■ • ■ , y P ) = dij(yi, ■ • ■ , c, . . . , y p ), and 
the c is on the m'th coordinate. The results of Theorem [53] and Theorem 15 .41 show 
that this definition can be interpreted as having the process with generator A arise 
from some SDE with Wiener noise and making interventions as in Definition [ 



18 A. SOKOL AND N. R. HANSEN 

and this is well-defined in the sense that the interpretation yields the same result, 
independent of the particular SDE. 



6. Interventions and weak conditional local independence 



In this section, we discuss the relationship between postintervention processes and 
weak conditional local independence (WCLI) of the observational process. We 
first review some results on random measures and semimartingale characteristics. 

Recall that a random measure on R + x M. d is a family of nonnegative measures 
(lj,(uj,-) u£ n such that /j,(u, {0} x R d ) = for all u. Put ft d = ft x M + x R d , 
Od = O & Bd and Vd = V ®Bd, where O and V denote the optional and pre- 
dictable cr-algebras on ft x K + , respectively. A mapping from ft^ to K which is 
Od measurable is called an optional function, and a mapping from ft^ to R which 
is Vd measurable is called a predictable function. If we wish to make the filtra- 
tion {Ft) explicit, we refer to {F t ) optional and {Ft) predictable functions. Note 
that as Od Q F ® B+ ® Bd, it holds that for any optional function W and any 
fixed weft, {t,x) H> W(w, t, x) is B+ <E> Bd measurable. Therefore, the integral 
i|otlxR d l 1 ^'( w ' s ' :z ')l d/u(w, ds, dx) is always well-defined. We write (\W\ * //)*(<*;) 
for this integral. When (\W\ * n)t{u) is finite for all u> and t > 0, we furthermore 
define (W * fi)t(u) = /r t]xR d ^( w j s j x ) d/-*(w, ds, dx). If W * \x is optional for all 
nonnegative bounded optional /z-integrable functions W, we say that [i is optional. 
If W * [i. is predictable for all nonnegative bounded predictable /i-integrable func- 
tions W, we say that W is predictable. For any optional random measure ^i, we 
say that fi is 'P ( j-o'-fimte if there is a partition {A n ) n >i of Pd measurable sets of 
ftd such that E(lA n * m)oo is finite. 

By Theorem II. 1.8 of [19j . for any optional "Pd-cr-fmite random measure /i, there 
exists a predictable random measure v, unique up to indistinguishability, such that 
for all nonnegative bounded Vd measurable functions W, E{W*v) OD = E{W*fJ,)oo- 
We refer to v as the compensator of fi. Furthermore, Theorem II. 1.8 of [TH] also 
shows that if \W\ * \i is locally integrable, then \W\ * v is locally integrable as well. 

We now introduce the characteristics of a d-dimcnsional semimartingale X. For 
such a semimartingale, we define the jump measure fj, for X by letting \i x (u) be 
the measure on B+ ® Bd defined by 

(6.1) /( w )(A) = ^U(i,A^H). 

By Proposition II. 1.16 of [115], fi x is optional and 'Pd-cr-finite. Therefore, the 
compensator of [i x exists, we denote it by v x . Furthermore, we define a mapping 
h d : R d — > R d by letting h d {x) = a;l(|| x || 2 <i), the canonical truncation function. 
Then X t — J2o<s<t ^X s — h d {AX s ) is a special semimartingale, and we let B be its 



CAUSAL INTERPRETATION OF SDES 19 

predictable finite variation part. Finally, we let C be the process with values in the 
real symmetric dx d matrices given by C % t 3 = [(X l ) c , (X J ) c ] t , where (X l ) c denotes 
the continuous martingale part of X c , see Proposition 1.4.27 of [19]. We then define 
the /^-characteristics of X to be the triple (B,C,v x ). For convenience, we will 
also just refer to (B, C, v x ) as the characteristics of X, supressing the dependence 
on h d . By Remark II. 2. 8 of 19 , for a fixed truncation function h d , see Definition 
II. 2. 3 of [T5], the characteristics are unique up to indistinguishability. 

Before proving our main result, we state a lemma. We remark that the calculation 
of the characteristics in the proof of Lemma 16.11 is similar to the results given as 
Proposition IX. 5. 3 of [19] and Lemma 2.5 of [20] . 

Lemma 6.1. Let K be a d-dimensional predictable and locally bounded process, 
and define Y t — ^ 7 =i Jo K^dZ^,. Letting (B z ,C Z ,is z ) be the h d - characteristics 
of Z , it holds that the ^-characteristics (B Y ,C Y ,i ,Y ) ofY are given by 



(6.2) 



.7=1 J ° 



3- 
d d 

jk 



(6.3) CY = J2z2 K s K s<C z ) 

3 = 1 fc=l 



(6.4) v y {uj,A)= l A (t,H(x) t (uj))du z (u3,dt,dx), 

JR + xR d 

where ig B and H(x)t(uj) = Y] ■_ 1 K 3 (lo)xj. 

Proof. We begin by calculating an expression for the first characteristic, B Y . To 
do so, we identify the predictable finite variation part of the special semimartingale 
Y t - J2o <s <t Ay s - ^(AYs). Note that {u>,t,x) h-> H(x) t (uj) is predictable. By 
the definition of B z , there exists a d-dimensional local martingale M such that 
Z t = Z + Bf + M t + J2a<s<t AZ * ~ h d (AZ s ). We then obtain the decomposition 
Y t = A t + 5Z,=i Jn K 3 s diW|', where the latter is a local martingale and 

A t =J2 f Ki d(B z )i +J2J2 K s( Az i h%AZ) s ) 

3=1 ° 3=1 0<s<t 

= J2[ KI d(B z )l + £ H(AZ S ) S - H(h d (AZ s )) s 

j=l ^° 0<s<t 

= J2 f Kl d{B z )i + ((H-Ho h d ) * n z ) u 

3 = 1 J ° 



(6.5) 



understanding that H — Hoh d here denotes (uj,t,x) h-> H(x) t (uj) — H(h d (x)) t {ui), 
which is a predictable function, and the integral with respect to ji z is finite by 



20 A. SOKOL AND N. R. HANSEN 

taking absolute values and calculating backwards. With similar notation, we also 
obtain 

d Id 

^ ay s h\AY s ) = J2 E K * AZ * hl E K i AZ * 

o<s<t a<s<tj=i \J =1 

(6.6) = J2 H(AZ s )-h 1 (H(AZ s )) = ((H-h 1 oH)*n z ) t . 

0<s<t 

Therefore, we obtain Y t - J2a<s<t AY * ~ ^(AF S ) =A t + £!* =1 Jo K i dM i> where 
A is the hnite variation process given by 



(6.7) A t = y K{ d(B z )i + ((h 1 oH-Hoh d )* fi z ) t 



A ^zti Kid(B z )i + ((h l oH-H. 

.7 = 1 J ° 



Now dehne B Y = Y? j=1 f* K{ d(B z )l + {{h 1 o H - H o h d ) * v z ) t , where the 
integral with respect to v z is well-defined as integrability with respect to /i Z 
implies integrability with respect to v z . As h 1 o H — H o h d is a predictable 
function, the latter term is predictable. And as B z is predictable, the process 
L K\ d(B z ){ only jumps at predictable times T, and the jump is K T A(B Z ) 3 T , 
which is Ft- measurable by Corollary 3.23 of [17]. Therefore, Theorem 3.33 of 
[T7] shows that J Q K\ d{B z ){ is predictable, and thus B Y is predictable. Thus, B Y 
is the predictable finite variation part of the process Y t — J2o< s <t A ^ s — ^ 1 (AF S ) 
and is therefore the first characteristic of Y. 

As regards C Y , note that by Theorem 9.3 of Q7], Y t c = Y.]=\ So K i d ( Z3 Ys- Th us, 
we immediately obtain Cj = £)J =1 Y? k= i K i K s d(C z )f . 

It remains to calculate the third characteristic. For all A £ B+ <8> B, we have 



o<t o<t y j=i 

(6.8) =ri A (t,ff(AZ(w),) t (w))= / U(i,^(x) t H)d/a z (w, dt, da;). 

Now define ^ f (cj,t4) = J R xRd l^(t, H(x) t (uj)) dv z (uj, dt, dx). We wish to argue 

that z^ y is the compensator of the jump measure of Y. To this end, we first show 
that v Y is predictable. By Section VI. 16 of [28 , V is generated by the family 
[T, oo[ for T a predictable stopping time. Therefore, V\ is generated by sets of 
the form [T, oo[xC, where C £ B. By a monotone convergence argument, we 
then obtain that in order to prove that v Y is predictable, it suffices to show that 
l[T,oo[xc * yY is predictable for all predictable stopping times T and all C E B. 



CAUSAL INTERPRETATION OF SDES 21 



To do so, fix a predictable stopping time T and a set C € 23, we then have 

Y\ (,,\ — I 1 . . . . /•* ETC-,.'! C, ,\\ ^,.Z/ 



(l[7\°°[xC * ^ )*(<*>) = / l[T(c;).oo)xc( i ^(a ; )t( w )) d ^ (w, dt, dx) 

J[0,t]xR d 

(6.9) =/ l p . i00lx0 (w,t,-H r (s)t(w))di^(«,dt,da;). 

J[0,t]xR d 

Now note that the mapping (ui,t,x) i-> H(x)t(uj) is "P <E> Bd-B measurable. From 
this, we conclude that (u>,£, a;) h-> (w,t, H(x)t(u))) is V®Bd-V ®B measurable. As 
[T, ooJxC e P®B, (w,i,x) h-> 1[t,oo[xC is "P <8> B-S measurable. We conclude 
that (w,t, a;) h-> l[r )00 [ x c(w,t, i?(x)t(w)) is 'P ( j-B measurable, thus a predictable 
function, so as v z is predictable, l[T.oo[xc* l/i ' i s predictable, so v is predictable. 
It remains to prove that E(W * i /Y ) C o — E(W * /i y )oo for all nonnegative bounded 
predictable functions W . Again, it suffices to consider predictable functions of the 
form 1[t,oo[xc- However, rewriting the integrand as a predictable function as in 
(|6.9|) . this follows immediately from the fact that v z is the compensator of \x z . □ 

Theorem 6.2. Assume that Z is a Levy process, and assume that for some c £ M, 
X 1 = Y l almost surely, where Y is the postintervention process of doing X m := c. 
Let (B,C,v) be the semimartingale characteristics of X 1 . Let (J : f m )t>o be the 
usual augmentation of the filtration induced by the processes X , . . . , X p excluding 
X m . Then B, C and v are (J-f m ) predictable. 

Proof. By Theorem II. 4. 15 of [T!5], Z being a Levy process implies the existence 
of a deterministic version (B z , C z , v z ) of the characteristics of Z. In particular, 
B z , C z and v z are all (F^ m ) predictable. And by our assumptions, we have 
X\ = x\ + J2j=i Jo ^-s d%i for some ceR, where K{ — bij(Y s _), In particular, 
K\ is (J-^ rn ) predictable and locally bounded. 

With H(x) t (uj) — ^2j =1 Kl(w)xj, we then find that (uj,t,x) >->• H(x) t {uj) is a 
(J-" t _m ) predictable function. By (|6.2p and Theorem 3.33 of [T7], we then obtain 
that B is (F^ m ) predictable. As regards the second characteristic, (16.31) shows 
that C is continuous and (J-" t _m ) adapted, therefore (J r f T m ) predictable. Finally, 
by the same argument as in the proof of Lemma l6.1[ we find that for any (J r t _m ) 
predictable stopping time and C € B, 1[t,oo[xc * ^ is (J-^ m ) predictable, and so 
v is (J(" m ) predictable. □ 



In words, Theorem 16. 21 states that under certain assumptions on the driving mar- 
tingales, having X z locally unaffected by an intervention in X m yields that X % is 
locally independent of X m - a claim made precise in the sense that the character- 
istics of X % are (J-" 4 _m ) predictable. 

We now relate this to the framework of weak conditional local independence. In 
[14] . the following definition of weak conditional local independence is made. 
Assume that Y is a d-dimensional special semimartingale with decomposition 



22 A. SOKOL AND N. R. HANSEN 

Y = Yo + A + M, where A is predictable and of finite variation and M is a 
local martingale. Let (B, C, v) be the characteristics of Y. In [14] it is further 
assumed that the coordinates of M have zero quadratic covariation and that the 
characteristic C is deterministic. In this case, Definition 2 of [14] states that X % is 
weakly conditionally locally independent (WCLI) of X m if the characteristics B l 
and v % of Xi are (J^ -7 ™) predictable. This definition is well-posed whenever the 
characteristics (B, C, v) are unique. Therefore, it can be extended to all special 
semimartingales. Making this extension, we obtain the following theorem. 

Theorem 6.3. Assume that Z is a Levy process. Then X is a special semimartin- 
gale. Let Y be the postintervention process obtained by doing X m :— c. If X 1 = Y l 
almost surely, then X 1 is WCLI of X m . 



Proof. By Lemma 16.11 there is a predictable version of the finite variation part 
of X. Therefore, X is a special semimartingale. Theorem 16.21 then yields the 
result. □ 



7. Discussion 



In this section, we will reflect on the results of the preceeding sections and discuss 
opportunities for further work. 



The definition of the postintervention SDE, Definition 12.21 is certainly an obvi- 
ous way to define how interventions should affect stochastic dynamic systems. 
However, the definition reflects unstated assumptions about causality, and it is 
important to make precise if the definition can be assumed to reflect an actual 
real-world intervention or if the definition is simply a mathematical construct. 
This is clarified in Section [4j where we used the DAG-based intervention calculus 
to show that the postintervention SDE of Definition 12.21 can be assumed to reflect 
real- world interventions when the following hold: 

(1) The SDE reflects a data-generating mechanism in which the variables at a 
given timepoint are obtained as a function of the previous timepoints and 
the driving semimartingales. 

(2) The driving semimartingales are locally unaffected by interventions. 

In full generality, causal mechanisms of a model is generally not identifiable from 
the observational distribution, see |33j . However, when considering only restricted 
classes of structural equation models, the underlying causal mechanisms may often 
be identifiable, see for example [35], [18] and [23]. In such cases, linearity of 
the functional relationships or gaussianity of the noise variables often determine 
identifiability. In our case, as shown in Section[5l identifiability holds whenever the 
driving semimartingale is a Levy process. This ensures practical applicability of 



CAUSAL INTERPRETATION OF SDES 23 

our results. The proofs given in Scction[5]uses the Markov structure of the solution 
to the SDE. In the case where the driving semimartingale has independent, but 
not stationary, increments, the solution to the SDE will be a non-homogeneous 
Markov process, thus also amenable to operator methods, though requiring more 
powerful technical results. We expect that Theorem 15.41 extends to this case. 
Likewise, Theorem 16.21 and Theorem 16.31 also extend to the case of increments 
that are independent but not stationary, as can be seen by the fact that Theorem 
II. 4. 15 of [19 also holds for such processes. 

It should also be noted that identifiability holds independently of the dimension 
of the driving Levy process. This is useful, for instance, in relation to Example 
12.11 We do not need to use the specific SDE driven by a four-dimensional Wiener 
process. We can replace the diffusion term in the SDE by a term involving the 
positive definite square root of the diffusion matrix and a two-dimensional Wiener 
process without affecting the postintervention distribution. 

It is, however, important to be careful about the interpretation of the identifiability 
result. The result states that when using Definition l2.2l to model interventions, the 
postintervention distributions are identifiable. As discussed above, Definition 12.21 
is not always useful as a notion of intervention: This requires that we are willing 
to interpret the SDE in a particular way. As Example 14.61 shows, not all SDEs are 
amenable to such an interpretation - it requires separate arguments, such as in 
Example 12.11 

A complete theory of interventions in continuous time stochastic processes should 
be able to cover cases such as Example 14.61 Our results should be seen as a step 
in the direction of a complete theory and encourage further generalizations. An- 
other opportunity for further research concerns latent variables: In the DAG-based 
framework of [52], the back-door and front-door criteria shows how to calculate 
intervention effects from the observational distribution in the presence of latent 
variables. For an SDE, the causal structure is summarized in the signature, see 
Definition I4.1[ which does not need to be acyclic, reflecting the possibility of feed- 
back loops. It is an open question how to obtain similar results in terms of the 
signature in the case of, for example, a diffusion model with some coordinates 
being unobserved. 



References 

1. O. O. Aalen, K. Ft0ysland & J. M. Gran: Causality, mediation and time: A dynamic view- 
point, J. R. Statist. Soc. A, 2012. 

2. D. Anderson, T. Kurtz: Continuous time markov chain models for chemical reaction net- 
works. In: H. Koeppl, G. Setti, M. di Bernardo, D. Densmore (eds.) Design and Analysis of 
Biomolecular Circuits, pp. 3-42, Springer, 2011. 

3. D. Applebaum: Levy processes and stochastic calculus, Cambridge University Press, 2009. 

4. D. Commenges & A. Gegout-Petit: A general dynamical statistical model with causal inter- 
pretation, J. R. Statist. Soc. B (2009), 71(3), p. 719-736. 



24 A. SOKOL AND N. R. HANSEN 

5. F. Comte & E. Renault: Noncausality in Continuous Time Models, Econometric Theory, 
Vol. 12(2), p. 215-256, 1996. 

6. A. P. Dawid: Causal inference without counterfactuals, Journal of the ASA, 2000. 

7. V. Didelez: Graphical models for marked point processes based on local independence, Jour- 
nal of the Royal Statistical Society, Series B, 70, 245-264, 2008. 

8. S. N. Ethier & T. G. Kurtz: Markov Processes: Characterization and Convergence, Wiley, 
1986. 

9. M. Eichler: Granger causality and path diagrams for multivariate time series. J Econometr 
137: 334-353, 2007. 

10. M. Eichler: Graphical modelling of multivariate time series, Probab. Theory Relat. Fields 
(2012) 153:233-268. 

11. M. Eichler: Causal Inference with Multiple Time Series: Principles and Problems, forthcom- 
ing, Phil. Trans. R. Soc. A., 2013. 

12. M. Eichler & V. Didelez: On Granger causality and the effect of interventions in time series, 
Lifetime Data Anal., 2009. 

13. J. -P. Florens &; D. Fougere: Noncausality in Continuous Time, Econometrica, Vol. 64(5), p. 
1195-1212, 1996. 

14. A. Gegout-Petit & D. Commenges: A general definition of influence between stochastic 
processes, Lifetime Data Anal. 16 p. 33-44, 2010. 

15. J. B. Gill & L. Petrovic: Causality and Stochastic Dynamic Systems, SIAM Journ. Appl. 
Math., Vol. 47(6), p. 1361-1366, 1987. 

16. G. Grubb: Distributions and operators, Springer- Verlag, 2008. 

17. He, S.-W., Wang, J.-G. & Yan, J. -A.: Semimartingalc Theory and Stochastic Calculus, 
Science Press, CRC Press Inc., 1992. 

18. P. O. Hoyer et al.: Nonlinear causal discovery with additive noise models, in Advances in 
Neural Information Processing Systems 21 (NIPS), p. 689-696, MIT Press, 2009. 

19. Jacod, J & Shiryaev, A.: Limit Theorems for Stochastic Processes, Springer- Verlag, 2003. 

20. J. Kallsen &; A. N. Shiryaev: Time Change Representation of Stochastic Integrals. Theory 
of Probability and Its Applications, 46:522-528, 2002. 

21. R. Meise & D. Vogt: Introduction to Functional Analysis, Oxford Science Publications, 1997. 

22. J. Pearl: Causality, 2nd edition, Cambridge University Press, 2009. 

23. J. Peters & P. Biihlmann: Identifiability of Gaussian Structural Equation Models with Same 
Error Variances, URL: |http://arxiv.org/abs/1205.2536[ 2012. 

24. L. Petrovic, S. Dimitrijevic: Invariance of statistical causality under convergence. Statist. 
Probab. Lett. 81, 1445-1448, 2011. 

25. L. Petrovic, D. Stanojevic: Statistical causality, extremal measures and weak solutions of 
stochastic differential equations with driving semimartingales. J. Math. Model. Algor. 9, 
113-128, 2010. 

26. Protter, P.: Stochastic Integration and Differential Equations, 2nd edition, Springer, 2005. 

27. Rogers, L. C. G. and Williams, D.: Diffusions, Markov Processes and Martingales, volume 

1, Cambridge University Press, 2000. 

28. Rogers, L. C. G. and Williams, D.: Diffusions, Markov Processes and Martingales, volume 

2, Cambridge University Press, 2000. 

29. W. Rudin: Real and Complex Analysis, McGraw-Hill, 3rd edition, 1987. 

30. K. R0ysland: A martingale approach to continuous time marginal structural models. 
Bernoulli. ISSN 1350-7265. 17(3), p. 895-915, 2011. 

31. K. R0ysland: Counterfactual analysis with graphical models based on local independence, 
forthcoming in Annals of Statistics, 2013. 

32. P. Spirtes et al.: Causation, Prediction and Search, Springer, 2011. 

33. T. Verma & J. Pearl: Equivalence and synthesis of causal models. In Proceedings of the 6th 
Annual Conference on Uncertainty in Artificial Intelligence (UAI), 1991. 

34. Wilkinson, Darren: Stochastic Modelling for Systems Biology, second edition, Chapman & 
Hall/CRC, 2011. 



CAUSAL INTERPRETATION OF SDES 25 

35. K. Zhang & A. Hyvarinen: On the Idcntifiability of the Post-Nonlinear Causal Model, in 
Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI), 
2009. 



Alexander Sokol: Institute of Mathematics, University of Copenhagen, 2100 Copen- 
hagen, Denmark, alexander@math.ku.dk 



Niels Richard Hansen: Institute of Mathematics, University of Copenhagen, 2100 
Copenhagen, Denmark, niels.r.hansen@math.ku.dk 



